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Abstract — A new framework for nonlinear system iden- 
tification is presented in terms of optimal fitting of stable 
nonlinear state space equations to input/output/state data, with 
a performance objective defined as a measure of robustness 
of the simulation error with respect to equation errors. Basic 
definitions and analytical results are presented. The utility of 
the method is illustrated on a simple simulation example as 
well as experimental recordings from a live neuron. 

I. Introduction 

Converting numerical data, originating from either phys- 
ical measurements or computer simulations, to compact 
mathematical models is a common challenge in engineering. 
The case of static system identification, where models y — 
h(u) defined by "simple" functions h(-) are fitted to data 
records of u and y, is a major topic of research in statistics 
and machine learning. This paper is focused on a subset 
of dynamic system identification tasks, where state space 
models of the form 

x(t+l) = f(x(t),u{t)), (1) 
y(t) = g(x(t),u(t)), (2) 

where / and g are "simple" functions, are extracted from data 
records of x, y, and u. We will also consider continuous time 
models of the form 

x(t) = f(x(t),u(t)), (3) 

with pj. If the data come from simulations of a complex 
model rather than experiments then this task is referred to 
as model reduction. 

There are two common and straightforward approaches to 
this problem: 

1) Equation-error minimization: (see, e.g., [1], [2]) A 
model is sought which minimizes a cost function of 
the following form 

i 

or similar, over the unknown parameters of /(•). A 
similar optimization can be set up for g(-). This is 
typically very cheap computationally: if /(•) and <?(•) 
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are linear in the unknown parameters then the problem 
reduces to basic least squares. However, if there is no 
incremental stability requirement then small equation 
errors \x(t + 1) — f(x(t),u(t))\ do not imply small 
simulation errors over extended time intervals. For 
large scale and nonlinear problems it is not unusual 
to find unstable models by this method, particularly if 
the true system is not in the model class being searched 
over. 

2) Simulation-error minimization: (see, e.g., [3]) One sets 
up a nonlinear programming problem to find 

where y v is the output of the simulation of the model 
system with a particular set of parameters -q which 
define /(■) and g(-). If successful, this can give a more 
robust fit than equation error. However, even if the 
system equations f(-),g(-) are linear in the unknown 
parameters, the relationship between the unknown pa- 
rameters i] and the long-term simulation y v (t) will be 
highly nonlinear and the optimization nonconvex. For 
systems with a large number of parameters, this can 
make global optimization of simulation error very dif- 
ficult unless good initial parameter guesses are known, 
which is seldom the case when considering black box 
model structures. 
The method proposed in this paper can be considered a 
middle-ground between these two extremes: we formulate 
a convex optimization problem to minimize an upper bound 
on the true simulation error while guaranteeing the stability 
and well-posedness of the identified model. Furthermore, we 
show that that in some simple cases the upper bound is tight. 

While ensuring stability complicates identification of both 
linear and nonlinear models, it is most challenging in the 
nonlinear case. Some recently proposed methods include 
LMI conditions for linear systems [4], convex relaxations for 
linear [5] and Wiener-Hammerstein systems [6], as well as 
passivity-like conditions for linear [7] and nonlinear models 
[8]. 

We do not in general guarantee finding statistically optimal 
or unbiased estimates. However, for nonlinear or high-order 



linear systems stability of the model and reduction of the 
long-term error dynamics are often major problems; these 
have been our primary targets. 

II. Problem Setup 

A. State Space Models 

We examine discrete time (DT) state space models of the 
form 

e(x(t + l)) = f(x(t),u(t)), (4) 
y{t) = g(x(t),u(t)), (5) 

where e : R n ^ W\ f : R n x E m \-> E™, and g : E" x 
E m i-> R fc are continuously differentiable functions such 
that the equation e(z) = w has a unique solution z £ E™ for 
every w £ 1". 

B. Stability 

We consider the DT model |4]),(|5]l stable if the difference 
{yi{t) — yo{t)}t^i i s square summable for every two solu- 
tions (u,x,y) = (ui,xi,yi) and (u,x,y) = (u ,XQ,y ) of 
(p]l,PJ with the same input u\ = uo = u. This definition can 
be qualified as that of global incremental output t2- stability. 

C. Data 

In applications, we expect to have input/state/output in- 
formation available in the form of sampled data Z = 
{z{U)}f =1 , where zfc) = (v(ti),x(ti),u(ti),y(ti)). Here 
x, u, y represent approximated samples of state, input, and 
output respectively. Section [VI-A| will discuss approaches for 
approximating the state of the system from input-output data. 

For the purpose of theoretical analysis, we will assume that 
the input/state/output information is available in the form of 
signal data where v, i, u, and y are signals, i.e. functions 

x,v: T^K™, u: T^K m , y : T i-> R k (6) 

such that 

r={l,...,JV>, v(t-l)=x{t)Vt£{2,...,N}. (7) 

D. Simulation Error 

Given DT signal data Z and functions e, /, g, the simula- 
tion error associated with a model matching (|4),<|5) is defined 

as 

JV 

£ = J2\y(t)-y(t)\ 2 , (8) 
*=i 

where y(t) is defined by (|4),(|5l with u(t) = u(t) and x(l) = 
x(l). ' 

E. Linearized Simulation Error 

A simplified version of the simulation error measure £ is 
the linearized simulation error £° defined in the following 
way. Consider a "perturbed" version of the system equations 

e(xg(t + l)) = f(x e (t),u(t))~ f Q (t), (9) 
ye(t) = g(x e (t),u(t)) - g (t). (10) 



Here 9 £ [0,1], f {t) = (1 - 9)e x (z(t)), and g (t) = 
(1 — 9)e y (z(t)), where e x and e y are the equation errors are 
defined by: 

£x(5) = f(x,u)-e(v), (11) 
e y(z) = 9{x,u)-y. (12) 

We examine the solution (x0,yg) of |9|l,([To]l with xg(l) = 
x(l), u(t) = u(t). 

By construction, yg = y for = 1, and yg = y for 9 = 0. 
We define 

f^zW) = lim-^|y(t)- ye (t)| 2 (13) 
~^ t=i 

to quantify local sensitivity of model equations with respect 
to equation errors. 

Using standard linearization analysis, it is easy to produce 
alternative expressions for £°: 

N 

£° = J2 \G(x(t),u(t))A(t) + e v (~z(t))\ 2 , (14) 
t=i 

where A(-) is defined by 

E(x(t + l))A(t + l) = F(x(t),u{t))A{t) + e x (z(t)), (15) 

with initial condition A(l) = 0, and E = E(x),F = 
F(x,u) and G = G(x 1 u) defined to be the Jacobians (with 
respect to x) of e, / and g respectively. 

F. Optimization Setup 

Within the framework of this paper, we consider efficient 
global minimization of the simulation error £ (over all model 
functions e, /, g, defining a stable system) as an ultimate (if 
perhaps unattainable) goal. We proceed by defining upper 
bounds for £ and £ which can be minimized efficiently by 
means of convex optimization (semidefinite programming). 
We will also prove some theoretical statements certifying 
quality of these upper bounds. 

III. Robust Identification Error 

The dependence of the simulation error (£ or £ °) on 
the coefficients of system equations (j4]),((5]l is complicated 
enough to make it a challenging object for efficient global 
minimization, especially under the stability constraint. The 
objective of this section is to introduce several versions of 
robust identification error (RIE) - a sample-wise measure of 
simulation error, motivated by the idea of using storage func- 
tions and dissipation inequalities to generate useful upper 
bounds of £ and £°. 

A. Global RIE 

The global RIE measure for a DT model (j4j),([5]» is a 
function of the coefficients of (|4},([5), a single data sample 

z = (v, x, u, y) £ E n x E" x E m x R k , (16) 



and an auxiliary parameter Q = Q' > 0, a positive definite 
symmetric n-by-n matrix (for convenience, we only indicate 
the dependence on z and Q): 

£ Q (z) = sup + A, fi) - e{v)\ 2 Q - \5 e \% + \S y \ 2 } . 

A 

(17) 

where |a|g is a shortcut for a'Qa, and 

S y = g(x + A, u) - y, S e = e(x + A) - e(x). (18) 

The following statement explains the utility of the RIE 
measure in generating upper bounds of simulation error. 
Theorem 1: The inequality 



N 



t=l 

holds for every Q — Q' > and signal data 
Proof. By the definition of £Q(z(t)) we have 

\f(x(t)+A,ti(t))-e(m)\Q 
\e(x(t) + A) - e{x{t))\l 

■ ■<> 



(19) 



+ \g(i(t) + A, fi(t)) - y (t)\ 2 < £ Q (z(t)) (20) 



for all A. Let x(t) and y{t) be defined by with u(t) = 

u(t) and x(l) = x(l). Substituting A(t) = x(t) — x(t) into 
(EO) yields 



\e(x(t + l))-e(x(t + l))\ 2 Q 

- \e(x(t)) - e(x(t))\ 2 Q + \y(t) - y(t)\ 2 < Sq{z{t)). (21) 
Summing these inequalities over t and noting: 
\e(x(l))-e(x(l))\ 2 Q = 0, \e(x(N+l))-e(x(N+l))\ 2 Q > 
yields ([19]) . ■ 

B. Local RIE 

The local RIE for a DT model is defined by: 

£° Q (~z) - sup {|FA + e x \% - \EA\ 2 Q + \GA + e y \ 2 } , 

A 

(22) 

and provides an upper bound for the linearized simulation 
error £° according to the following statement. 
Theorem 2: The inequality 



N 



(23) 



t=i 

holds for every Q = Q' > and signal data 
Proof. By the definition of £q\ 

\FA + e x \%- \EA\ 2 Q + \GA + e y \ 2 < £° Q {z) (24) 



holds for all A. Substituting A(t) = A(t) defined by p4) , 
with A(l) = 0, we have: 

\E(m)Mt+l%-\E(m)m\Q+\GA(t)+e y \ 2 < £° Q {~z) 
Summing over t yields (|23]l. ■ 



Note that the supremum in ( |22) i is finite only when the 
matrix 

Rdt = F'QF - E'QE + G'G (25) 

is negative semidefinite. In applications, strict negative defi- 
niteness of the matrix ( |25] l is enforced, to be referred to as 
robustness of the corresponding supremum. 

C. RIE and Stability 

The following theorem shows that global finiteness of the 
local RIE implies global stability of the model d4j,([5]). 

Theorem 3: Let continuously differentiable functions 
f,g,e and matrix Q = Q' > be such that e has a smooth 
inverse e~ (i.e. e~(e(x)) = e(e~(x)) — x for all x g M. n ), 
and £q(c~ (f(x,u)), x,u, g(x,u)) is finite for every x E W 1 , 
u € Kl m . Then system is globally incrementally output 

Instable. 

Proof. Let (u,x,y) = (u Q ,x 0l yo) and (u,x,y) = 
(ui,xi,ui) be two solutions of (|4]),(j5|l with uo = u\ = u. 
For 9 € [0,1] define (x*(9,t),y*(6 7 t)) as the solution of 
Q,(|5J with 

x.(0,l) =6x 1 (l) + (l-6)x (l). 

Then x*(8,t), y*(9,t) are continuously differentiable func- 
tions of 9 e [0, 1] for all integer t > 1, and 

V*(0,t) = yo(t), y*{l,t)= yi (t) Vt>0. 

Differentiating the identities 

e(x.(0,t+l)) = /(ar,(fl, *),«(<)), 

y*{e,t) 

with respect to 9 yields 

d x*{6,t) 

1)9 
dy.(0,t) 
09 



= g(x*(9,t),u(t)) 

= F{x*{6,t),u(t)) 
= G(x*(6,t),u(t)) 



dx*(9,t) 

~d6 ' 
dx*(6,t) 

Since the finiteness of £q{e~ (f(x, u)), x, u, g(x, u)) implies 
negative semidefiniteness of the quadratic form 

a(A) = \F(x,u)A\ 2 Q - \E(x)A\ 2 Q + \G(x,u)A\ 2 

for all x,u, we have 

w(t) < V{t) - V(t + 1) (26) 

for all t > 0, where 



w(t) 



V{t) 



dy*(9,t) 



09 



d6>\y40,t)-y*(l,t)\ 2 , 



09 



Summing (po} over t we find: 



JV 



>(t)<V(l)-V(N + l), 



and as V(N + 1) > the sum of w(t) is finite for all N. 
Since w(t) > \yo(t) — yi(t)\ 2 , this proves incremental L2 
output stability. ■ 



IV. A Convex Upper Bound for Optimization 

The results of the previous section suggest minimization 
(with respect to e,f,g,Q) of the sum of RIE over the 
available data points as an approach to system identification. 
However, in general, the RIE functions are not convex with 
respect to e, /, g and Q. In this section, we use the inequality 

- a'Qa < A'Q~ l A - 2A'a, (27) 

which, due to the identity 

A'Q _1 A - 2A'a + a'Qa = \a- Q^A\ 2 Q , 

is valid for all a, A 6 R n and a real symmetric n-by-n 
matrix Q such that Q = Q' > 0, to derive a family of 
upper bounds for the RIE functions. The upper bounds will 
be jointly convex with respect to e, /, g, and P = Q^ 1 > 0. 

A. Upper Bounds for Global RIE in Discrete Time 

Given a symmetric positive definite n-by-n matrix Q and 
functions e : R™ H> R", / : R"xR m 4 R" let 

S v = f(x + A,u)-e(v). 



Applying (27i with a = S e , to the — \5e\q term in the 



definition of £q(z) yields £q(z) < £q(z) where 
£ Q (z) = su V {\5 v \ 2 Q + \A\% - 2A'S e + \6 y \ 2 } 



(28) 



and P = Q 1 . The function £q(z) serves as an upper bound 
for £q(z) that is jointly convex with respect to e, /, g, and 

p = q- 1 >o. 

B. Upper Bounds for Local RIE in Discrete Time 

Given a symmetric positive definite n-by-n matrices Q 

\ / : R" x 



and functions e 



let 



A e 

A r 
A„ 



E(x)A, 
F(x,u)A + e x , 
G(x,u)A + t y . 



Applying (27 1 with a = A e , to the — |A e |?> term in the 



definition of £q{z) yields £q{z) < £q(z) where 

£° Q (z) = sup{|A„|3 + \A\ 2 P - 2A'A e + |A y | 2 } , (29) 



with P = Q 1 . The function £q{z) serves as an upper bound 

P = Q- 1 > 0. 



for £q(z) that is jointly convex with respect to e, /, g, and 



C. Well-Posedness of State Dynamics 

The well-posedness of state dynamics equation Q is 
guaranteed when the function e : K ra R™ is a bijection. 
The well-posedness of Q is implied by robustness of the 



supremum in the definition i29\ of the upper bound £% of 



the local RIE £q, i.e. by strict negative definiteness of the 
matrix: 

R dt = F'QF + P-E' -E + G'G. (30) 
Note that this is not guaranteed by the robustness of (|22|. 



Theorem 4: Let e : R ra H > R" be a continuously differ- 
entiable function with a uniformly bounded Jacobian E(x), 
satisfying: 

E{x) + E{x)' > 2r I, Vx € R" (31) 

for some fixed r > 0. Then e is a bijection. 
Proof. 

Consider the task of minimizing \e(x) — z\ 2 with respect 
toieM" for a given z e R n . Since E + E' > 2r Q I implies 



-A'[e(x + A0)-e(x) 



= A'E{x + A9)A 
> r \A\ 2 , 



we have 



\e(x + A) - e(x)\ > r \A\ Vx,A, 



(32) 



hence |e(x)| — > oo as |ai| — > oo, and the minimum of 
|e(x) — z\ 2 is achieved at some x — x . Then the first 
order optimality condition (e(xo) — z)'E(xo) = implies 
e(a^o) = z. To show that the equation e(x) = z has a unique 
solution, use ( [32j >. ■ 

When e(x) is nonlinear one can solve for x such that 
\x — xq\ < e (with e(xo) = z) via the ellipsoid method, 
or related techniques. Given a guess x, we know the true 
solution lies in a sphere: \e(x) — z\ > Tq\x — Xq\. Further, 
we have a cutting plane oracle: (x — Xo)'(e(x) — z) > 0. 

D. Coverage of Stable Linear Systems 

Since we have produced an upper bound for the simulation 
error both through the introduction of £q(z) and £q(z), it 
is desirable to check whether a basic class of systems will 
be recovered exactly. 

Consider a linear system 

x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) 

where x £ R" and u g R m . Define the "data matrices" from 
an experiment of length N to be X := [x(ti), . . . , i(tjv)], 
U := [u(ti), . . . , u(tN)]. Suppose we have fit a linear model 

Ex(t + 1) = Fx(t) + Lu(t), y(t) = Gx{t) + Hu(t). 

We consider a linear system to have been recovered exactly 
by the model if G = C,D = H,EB = L, and EA = F. 

Theorem 5: For data generated from a stable DT linear 
system with zero noise, if the data matrix [X' , U']' is of rank 
at least n + m, then the linear system is recovered exactly 
and 

£ Q (z)=£ Q (z) = 0. 

Note that by construction for the case of a linear model 
£q = £q. In order to prove this theorem, we will use the 
following lemma: 

Lemma 1: For any Schur matrix A there exists E, F and 
Q > such that EA = F satisfying M = M' < where 



M 



F'QF + Q- 1 -E' -E + G'G. 



(33) 



Proof. Since A is Schur, there exists a matrix R > such 
that A'RA -R< -G'G. Let E = R,F = JL4, Q = R~\ 



Substituting into (33i results in 

M = A'iLA - 2? + G'G < 

where the last inequality follows by construction of R. ■ 

Proof of Theorem [P] Using the choice of 25, F, Q in 
Lemma [T] since the data is noise free and EA = F we 
have e x = e y = 0. As a result, it follows from Lemma[T|that 
£q(z) is the supremum of a homogeneous negative-definite 
quadratic form in A, hence has a value of zero. Similarly, 



with zero noise (22 1 is the supremum of a homogeneous 



quadratic form in A and since £q{z) < £q(z) = hence 
£q (z) = 0. The rank condition on the data matrices ensures 
that if robust equation error is zero, then the true system is 
recovered. ■ 

V. Continuous Time Results 
For continuous time (CT) models, Q is replaced by 

d 



dt 



e{x{t)) 



f(x(t),u(t)), 



(34) 



or, equivalently, 

E(x(t))x(t) = f(x(t),u(t)), 

where E(x) is the Jacobian of e(-) at x. Naturally, E(x) 
is required to be non-singular for all x. We consider the 
model (|5]l,([34| is stable if the difference y\ — yo is square 
integrable for every two solutions (it, x, y) — (ui,x\,yi) 
and (u,x,y) — (uo,xo,yo) of |5|),([34]l with the same input 
U\ = Uq = u. 

We expect to have input/state/output information 
in the form Z — {z(ti)}fL 1 , where z(U) = 
(v(ti),x(ti),u(ti),y(ti)). Here X,u,y represent state, 
input and output respectively, whereas v ~ x. 

For the purpose of theoretical analysis, we will assume that 
the input/state/output information is available in the form of 
signals, that is functions: 



(35) 



such that 



T=[0,T], 5(i) = -i(t),VtG[0,T]. (36) 

In practice we have only sampled data, but for theoretical 
convenience we assume u(t) and v(t) exist as suitably 
smooth functions (e.g. piecewise continuous) interpolating 
the samples. 

A. Simulation Error 

Given CT signal data Z, and functions e, /, g, the sim- 
ulation error associated with a model matching (|5|),([34| is 
defined as „ 



£ = 



\y{t)-y{t)\ 2 dt, 



(37) 



where y is defined by |5|),([34]l with u(t) = u(t) and x(0) 
5(0). ' 



B. Linearized Simulation Error 

Similar to the DT case we examine a "perturbed" version 
of the system equations: 



d 



e(xg(t)) = f(x e (t),u(t)) - f (t), (38) 
y B (t) = g(x e (t),u(t))- g (t). (39) 



Here € [0, 1], f (t) = (1 - 9)e x (z(t)), and g Q {t) = (1 - 
9)e y (z(t)). We examine the solution (yg,xg) with x$(0) = 
x(0) and u(t) = u(t). For the CT case, the equation error 
e x of ( fTl) is replaced by: 



£x{z) = f(x,u) - E(x)v 
Note that for 9 = we have yg = y. 



(40) 



Via a linearized analysis similar to Section II-E we have: 



£° 



o 



\G(x(t),u(t))A(t) + e v {z(t))\ 2 dt, 



(41) 



where A(-) is defined by 

d 



dt 



[E{x(t))A(t)] = F(x{t), u(t))A(t) + e x (z(t)), (42) 



with initial condition A(0) = 0. 

C. Global RIE in Continuous Time 

The global RIE error measure for a CT model p),(|3"4"|i is 
similarly a function of e, /, g and Q = Q' > 0, as well as a 
single data-point z: 

£ Q {~z) = sup {2S' e Q[f(x + A, u) - E{x)v] + \S y \ 2 } . (43) 

A 

Theorem 6: The inequality 

rT 



£ < 



£ Q (z{t))dt, 



(44) 



where z(t) — (v(t),x(t),u(t),y(t)), holds for every Q = 
Q' > and signal data 

Proof. By the definition of £q(z) we have 

W e Q\f(? + A,u) - E(x)v] + \A y \ 2 < £ Q {z) (45) 

for all A. Let (x, y) be defined by (j5]),([34| with u(t) = u{t) 
and ar(0) = x(Q). Substituting A = x(t) - x(t) into ( |45| ) 
yields 



d\e(x(t))-e(x(t))\l 
dt 



+ \y(t)-y(t)\ 2 <£ Q (z(t))- 



Integrating this over the interval t £ [0, T] yields (44 



Theorem [6] suggests minimization of the integral in ( 44 1 
as an easier-to-handle alternative to minimization of the 
simulation error. In the case when system information comes 
in the sampled data format Z = {z(ti)}^ =1 , the theorem 
suggests minimization of the sum £Q(z(ti)) with respect to 
Q = Q' > 0, e, /, g as a system identification algorithm. 



D. Local RIE in Continuous Time 

The local RIE error measure for a CT model (|5|),(|34]i is 
denned by 

£° Q {z) = sup {2(EA)'Q(FA + e x ) + \GA + e y \ 2 } (46) 

A 

and provides an upper bound for the linearized simulation 
error £° according to the following statement. 
Theorem 7: The inequality 

£" f T £° Q {z{t))dt, (47) 



Applying (27 i with a = A e , to the second term in the 



holds for every Q — Q' > and signal data C35),C36). 

Note that the supremum in ( |46) > is finite only when the 
matrix 

R ct = E'QF + F'QE + G'G (48) 
is negative semidefinite. 

E. RIE and Stability 

A similar statement to Theorem [3] is available in the CT 
case: 

Theorem 8: Let two times continuously differentiable 
functions e,f,g and matrix Q = Q' > be 
such that E(x) is invertible for all x £ W 1 ), and 
£q(E(x)^ 1 f(x,u),x,u, g(x,u)) is finite for every x £ M", 
u £ W n . Then system |^),p4[) is globally incrementally 
output C2-stable. 

F. Upper Bounds for Continuous Time Global RIE 

Given a symmetric positive definite n-by-n matrix Q and 

/: M"x 



functions e : 



i — ^ 



let 



<5+ = 5 e + f(x + A, u) - E(x)v, 
8~ = 5 e - f(x + A, u) + E(x)v, 

where E is the Jacobian of e. 



Applying ( 27 1 with a = S e , to the second term in the 



expression on the right side of the identity 

45' e Q[f@ + A, u) E(x)v] = \St\ 2 Q - |^| 2 Q 
yields £q(z) < £q(z) where 



sup ■ 

A 



lei!' 



A'S: + \S y 



(49) 



that P = Q 1 . The function £q{z) serves as a CT upper 
bound for £q(z) that is jointly convex with respect to e, /, 
g, and P = Q- 1 > 0. 

G. Upper Bounds for Continuous Time Local RIE 

Given a symmetric positive definite n-by-n matrices Q 
and functions e : R n H> R n , f : W n x W m i-> R n let 

A+ = E(x)A + F(x,u)A + e a; , 
A- = E(x)A- F(x,u)A-e x , 

where E, F, G are the Jacobians of e, /, g with respect to x. 



expression on the right side of the identity 

4(EAYQ[FA + e x } = \At\ 2 Q -\A-\l 
yields £q(z) < £q(z) where 

%(z) = sup I lAtl ® + - A-A- + |A,p| , (50) 

with P = Q^ 1 . The function £q(z) serves as a CT upper 
bound for £q(z) that is jointly convex with respect to e, /, 
g, and P = Q- 1 > 0. 

H. Well-Posedness of State Dynamics 



A CT model is well posed so long as e from (34i has a 
non-singular Jacobian E = E(x) at every point x £ M. n . 
Invertibility of the Jacobian at a given point x is guaranteed 
by robustness of the supremum in the definition ( |4o*| of the 
local RIE £q (i.e. strict negative definiteness of R ct in (|48|). 

/. Recovery of Linear Systems 

A result similar to Theorem [9] can also be shown in the 
CT case based on the following lemma. 

Lemma 2: For any Hurwitz matrix A there exists E, F 
and Q = Q' > such that F = EA and M = M' < 
where: 

M :={E + F)'Q(E + F) + Q' 1 

- (E - F)' - (E - F) + 2G'G. (51) 

Proof. Since A is Hurwitz, there exists an R — R' > 
such that A'R + RA < -G'G. Take E = (I - AYR, F = 
(I - A)'RA, and Q = ((I - A)'R(I - A))- 1 Note that as 
A is Hurwitz, I — A will be nonsingular. Substituting these 
choices into ( |5"T| we have: 

M = 2A'R + 2RA + 2G'G < 

where the last inequality holds by the construction of R. ■ 

We again consider "data matrices" X := 
[x(ti), x(t N )], and U := [u{h), u{t N )}. 

Theorem 9: For data generated from a stable CT linear 
system with zero noise, if the data matrix [X' , U'}' is of rank 
at least n + m, then the linear system is recovered exactly 
and 

£ Q {z)=£ Q (z) = 0. 

Proof. The proof is nearly identical to that of Theorem ([9j, 
using Lemma (|2]l as necessary. ■ 

VI. Implementation Details 

We now discuss practical considerations for data prepara- 
tion and minimization of the upper bounds using semidefinite 
programming. 



A. Approximating States 

The RIE formulation assumes access to approximate state 
observations, x(t). In most cases of interest, the full state 
of the system is not directly measurable. In practice, our 
solutions have been motivated by the assumption that future 
output can be approximated as a function of recent input- 
output history and future input. To summarize recent history, 
we have had success applying linear filter banks, as is 
common in linear identification (e.g. Laguerre filters [9]). 

Even in fairly benign cases one expects the input-output 
histories to live near a nonlinear submanifold of the space 
of possible histories. As a result, linear projection based 
methods may require excessive dimensionality to approxi- 
mate the state of the system. Connections between nonlinear 
dimensionality reduction and system identification are being 
explored in the manifold learning community, such as [10] 
and [11]. 

For CT identification estimating the rates of the system, 
v(t) = 4fX(t), presents an additional challenge. For true sys- 
tem outputs, this can be approached via differentiation filters, 
or noncausal smoothing before numerical differentiation. 
Approximating additional states through filter banks allows 
the rates of these variables to be calculated analytically. 

B. Quality of Fit with Semidefinite Programs 

For any tuple of data, z(tj), the upper bound on the local 
RIE is the supremum of a concave quadratic form in A. 
So long as e, / and g are chosen to be linear in the decision 
variables, this upper bound can be minimized by introducing 
an LMI for each data-point using the Schur complement. We 
introduce a slack variable Si for each data-point: 



(52) 



which is a convex constraint and optimize for ^ Sj — !• min. 

Similarly, the upper bound on the global RIE is a function 
of A for fixed z(U). If we take e, / and g to be polynomials 
or rational functions with fixed denominators then the upper 
bound will be a polynomial or rational function in A. As a 
result, we can minimize this function by introducing a sum- 
of-squares (SOS) constraint [12]. We again introduce a slack 
variable sf. 

> £q(z(U)), (53) 

and optimize for ^\ Sj — > min. This equation will be 
polynomial in A and quadratic in n+1 other variables due to 
the Schur complement. In most cases, replacing the positivity 
constraint with a SOS constraint is another convex relaxation. 

When fitting a linear (affine) model for |4]i,(|5]) or |5},((34]) 
it is interesting to note that £ = £° and further the SDP can 
be posed to grow only with the dimension of the state, rather 
than the number of data points. For example, in the linear 
DT case one can compute the supremum ( p9] l (assuming it 
is finite) as a quadratic form in the data: 
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When minimizing the RIE over many data-points one can use 
the cyclic property of trace to restate the problem in terms 
of the empirical covariance matrix. Using an eigenvalue 
decomposition of the correlation matrix yields an equivalent 
optimization problem with no more than 2n + m + k LMI 
constraints. 

C. Choice of Basis and Stability 

Global finiteness of the the upper bound £q guarantees 
stability. For a fixed (x,u), boundedness can be verified via 
an LMI. Taking a polynomial or rational function basis for 
e, / and g, we can verify this LMI for all (x, u) using a 
SOS constraint. Global verification of the inequalities places 
some constraints on the degrees of these polynomials. For 
example, in DT the degree of E(x) must be able to be twice 
that of F(x, u) for the inequality to hold globally. 

In continuous time, we use the following parametrization 
to allow for global stability verification: 



e{x) 



e(x) 
q(x)' 



f(x,u) 
q(x)p(u) 



(54) 



Here q(x) : W 1 i— > E is a fixed polynomial of degree 2d x 
in each X{ such that q(x) > 1. Similarly p(u) : W n i->- K 
is of degree 2d u in each Ui, and p(u) > 1. The numerators, 
f(x,u) and e{x) are polynomials whose coefficients are 
decision variables. Both e(x) and f(x, u) are degree 2d x + 1 
in each xi and / is of degree 2d u in each u.i. 

With these choices of degrees, it is possible for the convex 
relaxation to be satisfied for all (x, u). The positivity of 
the expression can be tested via a SOS decomposition. In 
particular, we choose q(x) and p(u) to be nearly constant 
over the range of the observed data. For example, we take: 



q(x) = (l + \\x\\l) d - p(u) = (l + \Hi)^ 



(55) 



In general, centering and normalizing the data drastically 
improves numerical properties of the method. Here, rescaling 
the data such that it lies in a unit ball around the origin makes 
this choice of q and p apply more generally. 

When global stability is not required, care must be taken to 
ensure that solutions to the implicit form equations still exist. 
In continuous time this is guaranteed if E(x) is invertible for 
all x, and similarly it is guaranteed if e(x) is invertible in 
discrete time. Both of these constraints can be satisfied by 
requiring E(x)+E(x)' > 2r$I, which can again be enforced 
using a SOS constraint. 

VII. Examples and Discussion 

A. Stability and Noise 

When confronted with large measurement noise, we have 
observed that RIE minimization produces models which are 
more stable (e.g. damped for linear systems) than the system 
being fit. This is most evident in highly resonant, or nearly 
marginally stable systems. In these situations, we have had 
success minimizing equation error while simply requiring the 
local RIE to be finite at the sample points. Mitigating this 
effect is a focus of future work. 
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Fig. 1. A comparison of equation error minimization and local RIE 
minimization on a simulated, second-order, nonlinear discrete time system. 
The true system response to validation input is compared to an equation 
error fit (top) and local RIE fit (bottom). The system is not within the 
model class being searched over. 



Fig. 2. We compare several fits of the post-spike dynamics of a 
live neuronal cell on validation data. The "Robust Fit" corresponds to 
minimizing the local RIE, and is compared to both linear and nonlinear 
fits minimizing equation error. By t = 100, the nonlinear equation error fit 
has diverged. The linear fit does not capture the steep descent at t = 0, nor 
does it replicate the long term behavior. 



B. Simulated DT Example 

We consider a second-order nonlinear simulated discrete 
time system: 
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1). For training 



4sin(27r^t 2 ) 



where x = [xi(t) X2(t)]' and Vi(t) = Xi(t- 
we excite the system with a chirp: u(t) = 
for * G {1, ... , 500}. We observe y(t) = x(t) = x{t) + w(t), 
where w(t) is zero mean, Gaussian i.i.d. measurement noise 
with covariance = 0.0025/. 

We fit a model |4]),(|5) with <?(•,•) fixed a priori to be 
g(x,u) = x. We choose e(-) to be cubic, and /(•,•) to 
be a linear combination of u and the monomials up to 
total degree 7 in Xi. With these choices the true system 
is outside the model class. We compare minimizing the 
local RIE and minimizing equation error. In both cases, we 
restrict E + E' > 21 to remove the scale invariance of 
the problem. Figure ([T| presents the response of the true 
system and models for the input u tetit (t) 
overie {!,..., 200}. 



4sin(27r 5 i 5 t) 



C. Modeling of Post-spike Dynamics in Live Neurons 

Our second example is drawn from the task of identifying 
the response of the membrane potential of a live neuron. 
Details of the experimental procedure are given in the 
appendix. In particular, we are interested in identifying the 
dynamics of the neuron immediately following an action 
potential. 

We excite the neuron with 27 separate multisine input 
currents. The excitation is applied via a zero-order hold. The 
response is the sampled membrane potential of the neuron, 
y{t). Both measurement and control have a sampling rate 



of 10 kHz. This data set consists of 22 spikes which were 
separated into equal size training and testing sets. 

To achieve a 3rd order CT fit of the system, we pass 
the observed output voltage, y(t), through a filter bank 
determined by the first two Laguerre functions with a pole 
at 300 radians per second [9]. The original voltage and the 
output of this filter bank give us x(t) G M 3 . To compute v(t) 
we apply a noncasual regularized smoothing to the observed 
output and differentiate numerically. For our model structure 
we choose e, / polynomial in each Xi (degree 4) and / affine 
in u. As our observation is a state, we fix our model's g(x, u) 
to be the membrane potential. 

As the response is nearly periodic, we avoid repetitive 
data by picking approximately 500 data points uniformly 
spread throughout the (x,v,u) space. We minimize £q. 
For comparison, we also fit a model of the same structure 
minimizing the equation error, ^\ |e x (z(ii))| 2 — > min. In 
both cases, we insist on an invertible Jacobian E(x) by 
requiring E(x) + E(x)' > 10~ 3 / with 8 = le - 3. 

Figure [2] plots a neuronal response from the test set and 
the result of simulating the models from the same initial 
conditions. Also included is a first order DT model fit using 
equation error (CT and higher order linear equation error fits 
led to unstable models). 

Appendix 

A. Live Neuron Experimental Procedure 

Primary rat hippocampal cultures were prepared from PI 
rat pups, in accordance with the MIT Committee on Animal 
Care policies for the humane treatment of animals. Dissec- 
tion and dissociation of rat hippocampi were performed in a 
similar fashion to [13]. Dissociated neurons were plated at a 



density of 200K cells/mL on 12 mm round glass coverslips 
coated with 0.5 mg/mL rat tail collagen I (BD Biosciences) 
and 4 ^g/mL poly-D-lysine (Sigma) in 24-well plates. After 
2 days, 20 Ara-C (Sigma) was added to prevent further 
growth of glia. 

Cultures were used for patch clamp recording after 14 days 
in vitro. Patch recording solutions were previously described 
in [14]. Glass pipette electrode resistance ranged from 2- 
4 MO. Recordings were established by forming a Gil seal 
between the tip of the pipette and the neuron membrane. 
Perforation of the neuron membrane by amphotericin-B (300 
^g/mL) typically occurred within 5 minutes, with resulting 
access resistance in the range of 10-20 Mil. Recordings with 
leak currents smaller than -100 pA were selected for analysis. 
Leak current was measured as the current required to voltage 
clamp the neuron at -70 mV. Synaptic activity was blocked 
with the addition of 10 fiM CNQX, 100 fjM APV, and 10 fjM 
bicuculline to the bath saline. Holding current was applied 
as necessary to compensate for leak current. 
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